Seed and prompt relationship for Stable Diffusion
The same prompt will have this random range of output depending on the random number seed.
https://gyazo.com/8a371a254a1acabfdb6bd23d08da3b85
These 18 cards, all generated from the same prompt.
You'd think the composition would be mixed.
Why is this?
Initial value is just a random value
From there, repeated noise elimination is performed to get closer to "a distribution of images that humans can find meaning in.
We started completely differently, so the places we arrived at were completely different.
https://gyazo.com/8c4bf57630dab4cdd0e0de2796368568
In this schematic, it's drawn in two dimensions, but in reality there are 20,000 dimensions.
Almost all distributed on the surface
We cannot observe the distribution of "the set of things that humans can recognize as pictures" in 20,000 dimensional space.
In high-dimensional space, the normal distribution is almost uniformly distributed on the hypersphere
Another Perspective
https://gyazo.com/ef49b57a95f9a4a9019a508887b4f642
This is a vertical line starting from a "common random number seed"
Side by side generated by "common prompts".
There are people in the world who are trying to prompt trial and error with AI drawing services.
Those who don't fix the seed and use different prompts and trial and error make this observation
https://scrapbox.io/files/6331c923f88990002370e1c0.png
It's almost always a random seed that determines the picture.
If you look at this and try to find a connection to Prompt a - eto, I don't understand the translation.
People find meaning in random events, so some people find a false "prompt connection" to these "random seed effects".
Some people have written blog posts saying, "You should experiment with fixing the seed."
Those who have experimented with fixing the seed and changing the prompt make this observation
https://scrapbox.io/files/6331c928d802d7001d8ea097.png
This looks like it would be easier to make sense of than the previous example.
I'm throwing away E because it doesn't paint a decent picture."
However, when I try the same prompt with a different seed, I get this result
https://scrapbox.io/files/6331c92cf86836001d73190c.png
I showed it to my wife, who knew nothing about it.
My impression is that both e are quite good.
The top is better with A and E. I like E better.
Below, A, D, and E are good, D>A>E in that order, E is also well drawn, but he does not like the face
In other words, even if you "fix the seed and observe with different prompts," you will learn "how good or bad the prompt is in a particular seed."
It's like playing a specific random map in a random map generation game.
There's no guarantee that that do-it-yourself know-how will be useful on other maps.
Because "good" in one seed doesn't match up with another seed.
If I tried it on more other seeds, I'd be like, "It was just bad the first time, but it was actually surprisingly good."
But they underestimate it, so they don't "try more."
This makes it impossible to properly estimate the probability of success.
In the context of reinforcement learning, we are talking about how to avoid this pessimistic misunderstanding and not put more weight on exploration (Trade-offs between use and exploration). Multiple seed multiple prompts multiply and observe.
https://gyazo.com/58dadfb5bca69f105b112f852016c24a
A has a good chance of being so-so.
Sometimes E can be crazy weird, but sometimes it's very good.
C is a weird one.
There is a difference in probability distribution like
This knowledge is much better than seed-fixed, trial-and-error knowledge.
May become meaningless in future upgrades.
How much of it depends on the language model and how much of it reflects the structure of the language itself
We will know as various models come out in the future.
What is Thompson sampling?
Sampling from a hypothetical distribution to try the largest choice.
Appropriate attempts are made with large variances.
C is not useful and is automatically discarded, while A and E are subject to a new trial
Automatically update the distribution shape with the data increased by the trial
Q: Thompson sampling, I think we need feedback on the results of the evaluation, but who is doing it?
A: Humans do it.
About 1,500 images are generated per day, and my wife and I look at them and label them "this one is good, this one is bad," and we get about 100 "good ones" and 1,400 "bad ones.
I was trying to improve the prompts based on the results, which was done by humans in the beginning, but it became too much of a hassle, so I w
I said, "If you have this much data, you can automate it." I automated it.
So currently, it's like if you like something, just click the "Like" button and more good stuff will appear!
Q: You mean try more prompts that are tied to what you think is good?
A: Yes.
I'll add more later because it's a bit of a mish-mash of explanations.
Q: Is there any point in pursuing too many prompts individually?
A: I think so.
If you want to get a good picture, you have to pull out all the stops.
Because txt2img is, after all, a method that starts with completely random initial values.
If you want more control, I would have to use img2img.
Q: Is it hard to find a good image if I'm doing it just right?
A: Well, mess.
Without a definition of what constitutes a "good image," the "probability of getting a good one" is unknown.
If you pull a gacha with unknown probability, you may or may not get a good one, it's "luck"!
Q: What is the range of values a seed can take?
One integer between 0 and 4294967296 since it is just a random number seed
---
This page is auto-translated from /nishio/Stable Diffusionのシードとプロンプトの関係 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.